Kivan Polimis
April 04, 2022
Application Programming Interface (API)
“APIs are mechanisms that enable two software components to communicate with each other using a set of definitions and protocols. For example, the weather bureau’s software system contains daily weather data. The weather app on your phone ‘talks’ to this system via APIs and shows you daily weather updates on your phone.” - Amazon Web Services
In plain English, an API is a standard way for developers to communicate with software applications to request and send data. An API is composed of
“We live in the world of API economy, where new software is built by leveraging many different commercial or open source software components,” says [Bhanu] Singh [VP of Engineering] of OpsRamp. “For example, Uber uses a variety of software systems for things like payment, location, maps, and traffic, all of which rely on APIs to communicate.”
“Without APIs, developers would need to have intimate knowledge of the internal workings of an application to be able to extend its functionality,” says [Glenn] Sullivan [co-founder] at SnapRoute. “Instead, APIs give developers a way of collaborating to build more intricate systems of applications without having to work for the same company or even know each other
APIs have helped enable the automation boom in the software pipeline and elsewhere in the IT portfolio, for example. And again, in doing so, they reduce manual, repetitive, and often costly effort.
https://www.json.org/json-en.html
JSON (JavaScript Object Notation) is a lightweight data-interchange format.
#'from: https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html
library(jsonlite)
json <-
'[
{"Name" : "Mario", "Age" : 32, "Occupation" : "Plumber"},
{"Name" : "Peach", "Age" : 21, "Occupation" : "Princess"},
{},
{"Name" : "Bowser", "Occupation" : "Koopa"}
]'
prettify(json, indent = 4)## [
## {
## "Name": "Mario",
## "Age": 32,
## "Occupation": "Plumber"
## },
## {
## "Name": "Peach",
## "Age": 21,
## "Occupation": "Princess"
## },
## {
##
## },
## {
## "Name": "Bowser",
## "Occupation": "Koopa"
## }
## ]
##
REST stands for Representational State Transfer.
REST APIs offer four main benefits:
The steps to implement a new API include:
New web APIs can be found on API marketplaces and API directories.
Socrata, acquired by Tyler Technologies in 2018, has an “Open Data API [which] allows you to programmatically access a wealth of open data resources from governments, non-profits, and NGOs around the world.” - Socrata
https://socrataapikeys.docs.apiary.io/#introduction/why-use-api-keys?
Wait a second! Authentication is only necessary when accessing datasets that have been marked as private or when making write requests (PUT, POST, and DELETE). For reading datasets that have not been marked as private, simply use an application token.
credentials/socrata_app_credentials.yml, with valid CDC API credentialscredentials/example_socrata_app_credentials.yml for the 3 essential fields in the .yml file #' will only run if you have an appropriately formatted file,
#' `credentials/socrata_app_credentials.yml` with valid CDC API credential
socrata_app_credentials <- yaml.load_file(here("credentials/socrata_app_credentials.yml"))
#' Yearly Counts of Deaths by State and Select Causes, 1999-2017
#' https://data.cdc.gov/NCHS/NCHS-Leading-Causes-of-Death-United-States/bi63-dtpu
yearly_deaths_by_state_1999_2017 <- read.socrata(
"https://data.cdc.gov/resource/bi63-dtpu.json",
app_token = socrata_app_credentials$app_token,
email = socrata_app_credentials$email,
password = socrata_app_credentials$password
)## Rows: 10,868
## Columns: 6
## $ year <chr> "2017", "2017", "2017", "2017", "2017", "2017", "2017…
## $ X_113_cause_name <chr> "Accidents (unintentional injuries) (V01-X59,Y85-Y86)…
## $ cause_name <chr> "Unintentional injuries", "Unintentional injuries", "…
## $ state <chr> "United States", "Alabama", "Alaska", "Arizona", "Ark…
## $ deaths <chr> "169936", "2703", "436", "4184", "1625", "13840", "30…
## $ aadr <chr> "49.4", "53.8", "63.7", "56.2", "51.8", "33.2", "53.6…
credentials/socrata_app_credentials.yml, with valid CDC API credentialslibrary(lubridate)
library(scales)
library(ggplot2)
yearly_deaths_by_state_1999_2022$all_deaths <- as.numeric(yearly_deaths_by_state_1999_2022$all_deaths)
yearly_deaths_by_state_1999_2022$year <- ymd(paste0(yearly_deaths_by_state_1999_2022$year, "01-01"))
end_date <- ymd("2021-01-01")
start_date <- ymd("1999-01-01")
us_deaths_time_series <- ggplot(data = yearly_deaths_by_state_1999_2022 %>%
filter(state_name=="United States", year<ymd("2022-01-01"))) +
geom_point(aes(x = year, y = all_deaths, color="darkred", size=1.5)) +
geom_vline(xintercept = ymd("2019-12-01"), linetype="dashed",
color = "black", size=1) +
scale_y_continuous(labels=comma) +
scale_x_date("", breaks = date_breaks("2 year"),
limits = c(start_date, end_date),
labels = date_format(format = "%Y")) +
theme(legend.position="none") +
labs(x = "Date", y = "Total Deaths",
title = "US Total Deaths over Time: 1999-2021") +
annotate(x=ymd("2019-12-01"),y=+Inf,label="COVID-19",vjust=1,geom="label")credentials/purpleair_api_credentials.yml, with valid PurpleAir API credentialscredentials/example_purpleair_api_credentials.yml for the 2 essential fields in the .yml file#' will only run if you have an appropriately formatted file,
#' `credentials/purpleair_api_credentials.yml`, with valid PurpleAir API credentials
require(httr)
purpleair_api_credentials <- yaml.load_file(here("credentials/purpleair_api_credentials.yml"))
headers = c(
`X-API-Key` = purpleair_api_credentials$read_key
)
result <- httr::GET(url = 'https://api.purpleair.com/v1/sensors/25999', httr::add_headers(.headers=headers))## Response [https://api.purpleair.com/v1/sensors/25999]
## Date: 2022-04-04 22:24
## Status: 200
## Content-Type: application/json;charset=utf-8
## Size: 3.48 kB
## {
## "api_version" : "V1.0.10-0.0.12",
## "time_stamp" : 1649111052,
## "data_time_stamp" : 1649110996,
## "sensor" : {
## "sensor_index" : 25999,
## "last_modified" : 1554853637,
## "date_created" : 1549304400,
## "last_seen" : 1649110899,
## "private" : 0,
## ...
## [1] "api_version" "time_stamp" "data_time_stamp" "sensor"
## [1] 103
## [1] "sensor_index" "last_modified" "date_created" "last_seen"
## [5] "private" "is_owner" "name" "icon"
## [9] "location_type" "model" "hardware" "led_brightness"
## [13] "firmware_version" "rssi" "uptime" "pa_latency"
## [17] "memory" "position_rating" "latitude" "longitude"
## $pm2.5
## [1] 28.7
##
## $pm2.5_10minute
## [1] 27.8
##
## $pm2.5_30minute
## [1] 24.5
##
## $pm2.5_60minute
## [1] 22.1
##
## $pm2.5_6hour
## [1] 16.2
##
## $pm2.5_24hour
## [1] 15.9
##
## $pm2.5_1week
## [1] 10.9
##
## $time_stamp
## [1] 1649110899
PurpleAir will be downloaded and processed when this script is first run.The AirSensor package has three data models:
Think about the following questions:
What type of AirSensor data do we need to look at which locations have a moderate to unhealthy 30-minute air quality rating in Texas?
What type of AirSensor data do we need to check air quality recorded by a sensor named “Royal Oaks Houston Tx - Outside” between 01-01-2020 and 01-15-2020?
Install and load relevant packages:
tidyverse for data wrangling and plottingPWFSLSmoke for USFS monitor data access and plottingAirSensor for PurpleAir sensor data access and plottingMazamaSpatialUtils for spatial data download and utility functionsAirMonitorPlots for advanced plots for monitors.Note that package AirMonitorPlots might not yet be available on CRAN. To install, try the devtools package (see code/install.R)
The AirSensor package needs to know where processed data will live. For this report, we will specify a local archiveBaseDir where downloaded and processed data will live.
#' assign a name to the new local folder
archiveBaseDir <- here("data", "Australia_on_fire")
#' check if the same-named folder exists
#' if a same-named folder exists, print the warning
#' if no same-named folder, create the folder
if (file.exists(archiveBaseDir)) {
cat("The folder already exists")
} else {
dir.create(archiveBaseDir)
}## The folder already exists
We will use the pas_createNew() function to create a pas object containing all the spatial metadata associated with purple air monitors in Australia.
To create a new pas object you must first properly initialize the MazamaSpatialUtils package.
#' set package data directory
#' install required spatial data
#' initialize the package
filePath_pas <- file.path(archiveBaseDir, "pas_au.rda")
setSpatialDataDir(archiveBaseDir)
installSpatialData('NaturalEarthAdm1')
installSpatialData("CA_AirBasins")
setSpatialDataDir(archiveBaseDir)
initializeMazamaSpatialUtils() pas object and save it into an .rda file
#' Download, parse and enhance synoptic data from PurpleAir
#' and return the results as a useful tibble with class pa_synoptic
pas_au <- pas_createNew(countryCodes = "AU", includePWFSL = TRUE)
#' saving and loading the downloaded local file
#' save the synoptic data into an .rda file
save(pas_au, file = here("data", "pas_au.rda"))
#' load data from the .rda file
# pas_au <- get(load(here("data", "pas_au.rda"))) A pas object is a dataframe that contains metadata and PM 2.5 averages for many purple air sensors in a designated region. Each pas object can be filtered and edited to retain whichever collection of sensors the analyst desires based on location, state, name, etc.
It is important to note that the data averages in the pas object – the numeric values for PM2.5 or temperature or humidity – are current at the time that the pas is created. pas objects can be used to quickly explore the spatial distribution of PurpleAir sensors and display some then-current values but should not be used for detailed analysis.
Create a pas object with current data in US. Save this pas file to your computer for later use.
pas_us <- pas_createNew(countryCodes = "US")
pas_tx <- pas_us %>% pas_filter(stateCode=="TX")
#' saving and loading the downloaded local file
#' load data from the .rda file
save(pas_us, file = here("data", "pas_us.rda"))
save(pas_tx, file = here("data", "pas_tx.rda"))
# pas_us <- get(load(here("data", "pas_us.rda")))
# pas_tx <- get(load(here("data", "pas_tx.rda")))We can view the locations of each sensor and the AQI (Air Quality Index) maxima (when the pas was created) using the pas_leaflet() function.
We can also explore and utilize other PurpleAir sensor data. Check the pas_leaflet() documentation for all supported parameters. By default, pas_leaflet() will map the coordinates of each PurpleAir sensor and the hourly PM2.5 data.
Here is an example of humidity data captured from PurpleAir sensors across the state of New South Wales.
Use your pas object on US, map the hourly PM2.5 data in Texas
PurpleAir sensor readings are uploaded to the cloud every 120 seconds where they are stored for download and display on the PurpleAir website. After every interval, the synoptic data is refreshed and the outdated synoptic data is then stored in a ThingSpeak database. In order to access the ThingSpeak channel API we must first load the synoptic database.
https://en.wikipedia.org/wiki/ThingSpeak
Let’s look at data from multiple sensors in the Sydney area and one from Brisbane (north of Sydney) for the time period covering the Australian wildfires, December 2019 to January 2020.
gymea_bay_label <- c("Gymea Bay") #' Gymea Bay, Sydney, AU (southern Sydney)
north_sydney_label <- c("Glen Street, Milson’s Point, NSW, Australia") #' North Sydney, AU
brisbane_6th_ave_label <- c("St Lucia - 6th Ave") #' Brisbane, AU sensor
#' view unique labels in `pas_au` object
# unique(pas_au$label)
pat_gymea_bay <- pat_createNew(
pas = pas_au,
label = gymea_bay_label,
startdate = 20191229,
enddate = 20200110
)
save(pat_gymea_bay, file = here("data", "pat_gymea_bay.rda"))A pat object is a list of two dataframes, one called meta containing spatial metadata associated with the sensor and another called data containing that sensor’s time series data. Each pat object contains time series data for a temperature channel, a humidity channel, A and B PM 2.5 channel’s, and several other fields.
The following chunk demonstrates use of the pat_multiPlot() function to have a quick look at the data contained in a pat object. The plot shows both A and B channels as well as temperature and humidity. The plotting function is flexible and has options for choosing which channels to display, on the same or individual axes. (Type ?pat_multiPlot to learn more.)
pas object we generated on US to create a pat object on a sensor named “Royal Oaks Houston Tx - Outside” for the time period 2020-01-01 to 2020-01-15`.pat_houston.pat object.
start_date <- 20200101
end_date <- 20200115
pat_houston <- pat_createNew(label = "Royal Oaks Houston Tx - Outside",
pas = pas_tx,
startdate = start_date,
enddate = end_date
)
pat_houston %>%
pat_multiPlot(plottype = "all")For the purposes of this exploratory example, we are focusing on Australia but maybe we want to filter even more and just look at the sensors within a certain radius of the one we chose in Sydney.
lon <- pat_gymea_bay$meta$longitude #' get the longitude of sensor "Gymea Bay"
lat <- pat_gymea_bay$meta$latitude #' get the latitude of sensor "Gymea Bay"
pas_sydney <-
pas_au %>%
#' Filter for PurpleAir sensors
#' within a specified distance from specified target coordinates.
pas_filterNear(
longitude = lon,
latitude = lat,
radius = "50 km"
) pas object on US, filter (pas_filter) and plot (pas_leaflet) which locations have a moderate to unhealthy 6-hour average air quality rating (pm25_6hr equal to or higher than 25) within 200 km of Houston, Texaspat dataWe can create pat objects for sensors listed in the pas. Since the fires in Australia started, new PurpleAir sensors have been popping up left and right. Let’s start by grabbing data from sensors with the longest history.
start_date <- 20191210
end_date <- 20200110
pat_chisholm <- pat_createNew(
label = "Chisholm",
pas = pas_au,
startdate = start_date,
enddate = end_date
)
pat_moruya <- pat_createNew(
label = "MORUYA HEADS",
pas = pas_au,
startdate = start_date,
enddate = end_date
)
pat_windang <- pat_createNew(
label = "Windang, Ocean Street",
pas = pas_au,
startdate = start_date,
enddate = end_date
)In order to look for patterns, we can look at the PM2.5 data recorded on channel A from all the sensors. This chunk uses ggplot2 to view all the data on the same axis.
colors <- c("Chisholm" = "#1b9e77",
"Moruya" = "#d95f02",
"Windang" = "#7570b3")
multisensor_pm25_plot <- ggplot(data = pat_chisholm$data) +
geom_point(aes(x = pat_chisholm$data$datetime,
y = pat_chisholm$data$pm25_A,
color = "Chisholm"), alpha = 0.5) +
geom_point(data = pat_moruya$data,
aes(x = pat_moruya$data$datetime,
y = pat_moruya$data$pm25_A,
color = "Moruya"), alpha = 0.5) +
geom_point(data = pat_windang$data,
aes(x = pat_windang$data$datetime,
y = pat_windang$data$pm25_A,
color = "Windang"), alpha = 0.5) +
labs(title = "PM 2.5 channel A for multiple sensors" ) +
xlab("date") +
ylab("ug/m3") +
scale_colour_manual(name="Sensor",values=colors) +
theme(legend.position= c(0.9, 0.8))What do you find from the above graph? What other information would you check to confirm your findings?
Let’s check a few other sensors that are closer in proximity to Chisholm to see if they are also reporting abnormal values. Download and plot data on sensors “Bungendore, NSW Australia” and “Downer” together with “Chisholm” for the same time period. Make your plot easy to read. What do you find?
Are there any other factors that may impact the validity of the sensor data we observed?
Use the following ids for sensors in Pasadena and Houston to find the associated sensor labels and recreate the analysis done on the Chisholm, Moruya, and Windang sensors
pasadena_ids <- c("98633", "99813")
houston_ids <- c("26659", "133994")
pas_tx %>%
filter(ID %in% pasadena_ids | ID %in% houston_ids)## # A tibble: 4 × 44
## ID label DEVICE_LOCATION… THINGSPEAK_PRIM… THINGSPEAK_PRIM…
## <chr> <chr> <chr> <chr> <chr>
## 1 98633 AAH Meadowlake outside 1295131 61K7OC25Z3F0CUK8
## 2 99813 AAH Wyne outside 1303443 EHROTVA5HI2WIY5B
## 3 133994 Eastwood Houston Te… outside 1543032 037RM00T3L4PC0FO
## 4 26659 Rice Military Div outside 702833 9NT4RUKKX2EG26BH
## # … with 39 more variables: THINGSPEAK_SECONDARY_ID <chr>,
## # THINGSPEAK_SECONDARY_ID_READ_KEY <chr>, latitude <dbl>, longitude <dbl>,
## # pm25 <dbl>, lastSeenDate <dttm>, sensorType <chr>, flag_hidden <lgl>,
## # flag_highValue <lgl>, isOwner <int>, humidity <dbl>, temperature <dbl>,
## # pressure <dbl>, age <int>, parentID <chr>, flag_attenuation_hardware <lgl>,
## # Ozone1 <chr>, pm25_current <dbl>, pm25_10min <dbl>, pm25_30min <dbl>,
## # pm25_1hr <dbl>, pm25_6hr <dbl>, pm25_1day <dbl>, pm25_1week <dbl>, …
The pat data quality can degrade over time. For a quick sanity check, we can use the pat_dailySoHIndexPlot() function to plot the daily State-of-Health index. This function plots both channels A and B with a daily State-of-Health index along the bottom.
Can we make the conclusion that the smoke in Sydney area was clearly very bad over our time period of interest?
Plot and evaluate the state-of-health for a Houston area sensor we just looked at.
Another way to look at the PurpleAir sensor data is to convert the pat into an airsensor object. The following chunk aggregates data from a pat object into an airsensor object with an hourly time axis.
Because we are dealing with values far outside the norm, we will use the PurpleAirQC_hourly_AB_00 function which performs minimal quality control. See RDocumentation of the package for more details.
Now that we have converted the relative raw pas into an airsensor object, we can use any of the “monitor” plotting functions found in the PWFSLSmoke or AirMonitorPlots packages.
Here, the bar associated with each day is colored by Air Quality Index (AQI). Over this time period there were only 10 days where the daily average AQI was below “Unhealthy”.
To get a sense of what direction smoke is coming from, we use the sensor_PolluationRose() function. As the name implies, this function takes an airsensor object as an argument. It then obtains hourly wind direction and speed data from the nearest meteorological site and plots a traditional wind rose plot for wind direction and PM2.5.
In this case, it looks like the smoke is coming mostly from the E/NE which is validated by this wind rose plot from the Australian Bureau of Meteorology here.
Check the wind direction of your pat_houston object.
In this Intro to API workshop we (hopefully)
AirSensor package to investigate air monitor quality data